Problem  Studied 


The  problem  studied  was  to  signifieantly  enhance  the  eapability  and  usability  of  online  software 
and  data  repositories.  To  this  end,  a  variety  of  approaches  were  employed: 

•  Development  of  mechanisms  for  location  of  resource  replicas,  with  a  guarantee  of 
consistency  and  integrity  for  those  replicas; 

•  Development  of  new  interfaces  for  user  interaction  with  software  and  data  repositories, 
enhancing  the  user's  ability  to  identify  appropriate  software  and/or  data  for  his  needs; 

•  Development  of  active  repository  teehnologies,  for  example: 

•  repositories  that  build  specialized  software  components  on  demand, 

•  repositories  that  deliver  portable,  mobile  software  components  that  can  be 
integrated  with  applications  without  rebuilding  them, 

•  applications  programming  interfaces  to  network-based  computational  servers, 
which  deliver  results  to  common  mathematical  computations  on  demand. 

Summary  of  the  Most  Important  Results 

The  Resource  Cataloging  and  Distribution  System  (RCDS)  was  designed  to  facilitate  scalable 
distribution  of  resources  to  multiple  sites  for  efficiency  and  redundancy,  while  assuring 
authenticity,  integrity,  and  consistency  of  the  resources  and  metadata.  Software  and  data 
repositories  can  use  RCDS  as  a  substrate  to  maintain  replicas  of  their  resources,  while  supporting 
a  decentralized  management  paradigm.  Users  of  those  repositories  benefit  from  more  reliable 
and  efficient  access.  RCDS  employs  digital  signatures  to  protect  against  malicious  or  accidental 
modification  of  codes  and  data  provided  through  its  serviee. 

The  SONAR  proximity  estimation  system  aids  a  client  in  selecting  a  nearby  replica  of  a  resource, 
by  providing  a  relative  ordering  of  replica  locations  according  to  their  network  proximity  to  the 
client.  This  improves  response  time  and  provides  better  utilization  of  the  network. 

Another  paekage  was  developed  to  provide  a  suite  of  software  access  eontrol  and  licensing  tools. 
This  package  allows  software  maintainers  to  select  from  a  variety  of  user  authentication  methods 
to  provide  access  control  to  their  software,  thus  protecting  their  intellectual  property.  In  addition, 
the  package  allows  repository  maintainers  to  require  registration  and/or  a  license  agreement 
before  the  software  can  be  downloaded. 

In  the  area  of  improved  interfaces  for  software  repositories,  the  Approximation  Wizard  is  an 
example  of  "data-driven  searching;”  the  user  supplies  a  data  file  and  the  Approximation  Wizard 
deduces  its  format  and  runs  a  battery  of  tests  over  it,  looking  for  distinctive  features  such  as 
monotonicity,  noise,  strong  frequency  components,  and  so  on.  After  confirming  those 
diagnostics  with  the  user,  it  selects  specific  algorithms  that  are  appropriate  for  the  data, 
supplying  graphical  displays  of  the  results. 
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The  Guide  to  Available  Mathematieal  Software  (GAMS)  is  a  eross-index  and  virtual  repository 
of  publie-domain  and  commercial  software  components.  GAMS  provides  enhanced  indexing  and 
content-specific  search  capabilities  for  a  superset  of  the  netlib  repository. 

The  Matrix  Market  is  a  database  of  artifacts  used  in  testing  and  evaluating  algorithms  and 
software  for  numerical  linear  algebra.  The  system  provides  search  facilities  for  matrix  data, 
visualizations  of  matrix  structure,  and  facilities  for  on-demand  generation  of  matrices  to  meet 
user-specified  criteria. 

The  Digital  Library  of  Mathematical  Functions  provides  online  reference  data  for  the  special 
functions  of  applied  mathematics.  The  emphasis  is  on  development  of  active  components,  such 
as  interactive  visualizations,  identities  that  can  be  cut-and-pasted  into  computer  algebra  systems, 
and  tables  of  certified  data  that  are  generated  on  demand.  This  is  an  attempt  to  address  the 
challenge  of  providing  effective  search  techniques  for  specialized  mathematic  reference  data, 
without  extensive  auxiliary  text. 

Several  of  these  repositories  have  adopted  the  Basic  Interoperability  Data  Model  (BIDM) 
developed  by  the  Reuse  Library  Interoperability  Group  (RIG),  of  which  the  project  participants 
have  been  active  members.  The  BIDM  is  an  IEEE  standard  that  specifies  a  minimal  set  of 
catalog  information  that  a  software  repository  should  provide  about  its  software  resources  in 
order  to  interoperate  with  other  repositories.  The  BIDM  is  expressed  in  terms  of  an  extended 
entity-relationship  data  model  that  defines  classes  for  assets  (the  software  entities),  the  individual 
elements  making  up  assets  (i.e.,  files),  reuse  libraries  (i.e.,  repositories)  that  provide  assets,  and 
organizations  that  develop  and  manage  libraries  and  assets.  The  BIDM  helps  provide  semantic 
interoperability  between  different  repositories. 

A  toolkit  known  as  Repository  in  a  Box  (RIB)  enables  software  repository  developers  to  create 
and  maintain  software  catalog  records  using  the  BIDM,  to  exchange  those  records  as  well  as 
software  resources  themselves  with  other  repositories,  and  to  provide  a  user  interface  for  their 
software  catalog.  RIB  software  catalogs  are  maintained  and  accessed  using  a  WWW  interface. 
RIB  administrative  functions  available  through  this  interface  allow  maintenance  of  software 
catalog  records,  exchange  of  catalog  records  with  other  repositories,  administration  of  browsing 
and  searching  interfaces,  discipline-oriented  and  site-specific  customization,  file  uploading,  and 
replication.  Multiple  repositories  may  be  maintained  at  the  same  site  using  a  single  RIB 
installation. 

Java  has  proven  to  be  very  useful  in  these  efforts.  For  GAMS,  we  have  been  able  to  develop  a 
complete  portable  user  interface  in  Java,  which  provides  a  variety  of  capabilities  not  present  in 
the  hypertext  version.  However,  for  the  Matrix  Market,  we  designed  a  series  of  applets  that  allow 
for  the  generation  of  matrix  test  data  in  a  user's  browser. 

This  work  has  demonstrated  the  great  promise  of  Java  in  the  evolution  of  software  repositories. 
Among  its  key  features  is  the  possibility  of  truly  mobile  mathematical  software  applications  and 
components.  The  use  of  such  components  could  have  a  significant  impact  on  how  such  software 
is  distributed  and  used.  To  explore  this  further,  we  undertook  a  systematic  study  of  the  suitability 
of  Java  for  numerical  computation.  This  study  paints  a  mixed  picture  of  the  capabilities  of  Java 
and  extant  Java  Virtual  Machines  (JVMs).  Among  the  problems  are  serious  performance 
barriers,  a  lack  of  key  facilities  enabling  efficient  scientific  software  development,  and  the  lack 


of  a  credible  base  of  software  components  that  can  support  serious  numerical  computation.  To 
help  make  Sun  and  others  aware  of  needs  of  the  high  performance  computing  community  we,  in 
collaboration  with  Sun,  Intel,  IBM,  The  MathWorks,  NAG,  Visual  Numerics,  Syracuse 
University,  the  University  of  California  at  Berkeley,  the  University  of  North  Carolina,  Indiana 
University,  and  others,  formed  the  Java  Grande  Forum.  Ron  Boisvert  and  Roldan  Pozo  of  the 
National  Institute  of  Standards  Technology  (NIST)  chair  the  Forum's  Numerics  Working  Group. 
This  group  is  working  to  improve  the  Java  language  and  its  environment  for  numeric-intensive 
computations  through  both  changes  in  the  Java  language  and  its  semantics,  as  well  as  the 
development  of  standard  class  libraries  for  core  numerical  functions. 

Another  key  technology  for  search  and  retrieval,  as  well  as  for  interchange  of  active 
mathematical  data,  is  the  development  of  ontologies  and  related  semantic-based  markup  systems 
for  mathematical  data.  An  example  of  this  work’s  importance  is  its  use  in  the  Digital  Library  of 
Mathematical  Functions.  The  W3C's  adoption  of  MathML  has  provided  an  important  first  step  in 
this  direction;  however,  the  semantic  markup  in  MathML  is  too  weak  to  support  the  description 
of  higher  mathematical  concepts  necessary  for  the  interchange  of  data  among  computer  algebra 
systems.  We  are  working  with  the  European  OpenMath  Consortium  and  the  North  American 
OpenMath  Initiative  (NAOMI)  to  develop  extensions  to  MathML,  which  provide  rich  semantic- 
based  markup  for  mathematical  data,  as  well  as  associated  tools. 

The  RCDS  software  is  being  used  to  support  a  variety  of  ongoing  research  projects,  including 
projects  related  to  software  reuse,  as  well  as  other  applications  of  highly  scalable  resolution  and 
replication  services: 

•  The  Program  Builder  is  a  tool  that  assembles  computer  programs  out  of  source  code 
components  obtained  from  repositories.  It  can  download  a  description  of  the  program 
describing  the  components  needed  to  build  a  program  and  instructions  for  building  it, 
retrieve  the  necessary  components  from  various  locations,  verify  the  authenticity  and 
integrity  of  each,  configure  them  according  to  the  target  environment,  compile  them,  and 
optionally  install  them.  The  program  builder  uses  RCDS  to  store  and  retrieve  catalog 
information  for  the  various  packages,  and  uses  signed  RCDS  catalog  information  to 
verify  the  authenticity  and  integrity  of  the  packages  it  uses. 

•  Similar  to  the  Program  Builder,  Netbuild  is  a  link-editor  that  allows  seamless  access  to 
remote  repositories  of  pre-compiled  subroutine  libraries,  such  as  the  mathematical 
libraries  in  the  Netlib  repository.  Netbuild  uses  RCDS  to  catalog  its  subroutine  libraries, 
to  identify  which  versions  of  the  libraries  are  appropriate  for  the  target  platform,  and  to 
verify  the  authenticity  and  integrity  of  the  libraries  before  they  are  linked. 

•  The  Scalable  Networked  Information  Processing  Environment  (SNIPE)  is  a  large-scale, 
fault-tolerant,  distributed  computing  environment  layered  on  top  of  RCDS.  It  uses  the 
Resource  Catalog  component  of  RCDS  to  store  metadata  about  computing  resources, 
processes,  resource  managers,  multicast  groups,  programs,  data  files,  and  checkpointed 
programs.  The  SNIPE  environment  utilizes  RCDS's  fault  tolerance  to  allow  it  to  survive 
multiple  host  failures.  Processes  can  migrate  from  one  host  to  another  for  load  balancing 
or  to  avoid  host  failures;  their  communications  peers  will  automatically  re-establish 
communications  at  the  new  location.  As  in  other  projects,  signed  RCDS  metadata  is  used 
to  verify  authenticity  and  integrity  of  programs,  data  files,  and  checkpoint  files.  RCDS 


metadata  is  also  used  to  list  communications  ports  by  which  a  process  can  be  reached, 
thus  allowing  a  peer  to  choose  the  best  available  path  when  establishing  a  connection. 

•  The  Resource  Catalog  portion  of  RCDS  is  also  used  as  a  substrate  in  implementations  of 
MPI_Connect.  MPI_Connect  is  being  used  by  CEWES  and  ASC  DoD  MSRCs.  It  has 
also  been  used  by  the  High  Performance  Computing  Center  in  Stuttgart,  Germany,  and 
the  Paderbom  Center  for  Parallel  Computing  (P2C). 

•  Harness  is  another  distributed  computing  environment  that  utilizes  RCDS  in  a  similar 
fashion  to  SNIPE.  Harness  uses  the  Resource  Catalog  component  of  RCDS  to  provide  a 
very  flexible  and  highly  customizable  metacomputing  environment  through  the  extensive 
use  of  RCDS-cataloged  plug-in  modules.  These  allow  a  "building  block"  approach  to  the 
construction  of  one  or  more  "virtual  machines,"  consisting  of  computing, 
communications  resources  and  software  repository  resources,  to  be  employed  in  a 
computation.  Harness  also  uses  RCDS  to  facilitate  merging,  splitting  of  virtual  machines 
and  communications  between  virtual  machines. 

•  NetSolve  provides  a  foundation  for  network-based  computational  servers,  allowing  users 
to  access  computational  resources  such  as  hardware  and  software,  to  be  distributed  across 
the  network.  NetSolve  is  investigating  the  use  of  RCDS  as  a  catalog  for  computational 
resources. 

Experience  derived  from  the  use  of  the  Resource  Catalog  resolution  subsystem  of  RCDS,  as  well 
as  from  similar  systems,  is  being  used  to  design  a  highly  scalable  resolution  system  for  Uniform 
Resource  Names  (URNs)  and  other  kinds  of  URIs.  URNs  will  serve  as  very  long-term  stable 
names  for  Internet-accessible  resources,  in  contrast  to  URLs,  which  change  if  the  resource  moves 
from  one  host  to  another,  or  if  the  domain  or  repository  is  reorganized.  URNs  have  been 
standardized  by  the  Internet  Engineering  Task  Force  (IETF). 

The  IETF  is  also  considering  standardization  of  a  general  purpose  web  replication/caching 
infrastiucture  and  general-purpose  name  resolution  systems  (tentatively  known  as  RESCAP), 
based  on  experience  with  the  RCDS  work.  Anticipated  applications  of  this  technology  include 
the  use  of  RCDS-like  resolution  services  for  content  or  protocol  negotiation  between  Internet 
clients  and  servers.  An  example  of  such  an  application  is  determining  the  capabilities  of  an 
electronic  mail  recipient's  user  agent  (e.g.  "Doesjoe@xyz.com  accept  powerpoint?"),  or 
mapping  an  E.  164  telephone  number  into  one  or  more  reeipient-specified  service  locations, 
which  provide  routing  for  a  voice,  fax,  or  pager  call  to  that  number. 

The  RIB  toolkit  is  currently  deployed  and  in  use  at  three  of  the  Department  of  Defense  Major 
Shared  Resouree  Centers  (MSRCs)  for  setting  up  repositories  for  several  of  the  DoD 
Computational  Technology  Areas  (CTAs)  as  part  of  the  High  Performance  Computing 
Modernization  Program.  Other  sites  that  are  using  RIB  to  set  up  software  repositories  include  the 
NASA  Earth  and  Space  Sciences  program  and  the  NSF-sponsored  metacomputing  centers  at 
NCSA  and  at  San  Diego.  Also,  several  domain  specific  repositories  are  maintained  here  at  the 
University  of  Tennessee  using  the  latest  version  of  RIB. 

RIB  has  proven  to  be  a  successful  application  for  allowing  metadata  interoperation  across  the 
Internet,  which  promotes  software  sharing  and  reuse.  Development  of  the  toolkit  has  resulted  in  a 


stable  release  (ver.  2.1)  that  has  undergone  testing  on  IRIX,  SunOS  4.x,  Windows,  AIX,  and 
Linux.  While  no  time  frame  has  been  established  for  the  next  RIB  release,  development  is 
currently  underway  including  the  addition  of  many  new  features,  some  of  which  have  come  at 
the  request  of  current  users.  Links  to  all  of  the  publicly  available  repositories  currently  utilizing 
the  RIB  toolkit  can  be  found  at  http://www.nhse.org/RIB/. 

The  concepts  developed  in  the  prototype  Digital  Library  of  Mathematical  Functions  will  be 
applied  to  the  development  of  a  complete  Web-based  mathematical  reference  service  for  special 
functions  to  be  maintained  at  NIST.  This  service  will  be  a  model  for  other  highly  interactive 
mathematical  reference  data  projects. 

Improvements  to  the  Java  language  and  its  environment  will  be  institutionalized  via  the 
standardization  process.  Vendors  of  Java  compilers  and  JVMs  will  be  required  to  support  such 
changes,  resulting  in  significant  performance  enhancements  for  scientific  applications. 
Standardized  numerical  APIs  will  increase  the  ease  of  coding,  portability,  and  reliability  of  Java 
applications. 
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